Search CORE

44 research outputs found

The Helmholtz Analytics Toolkit (HEAT): A scientific Big Data Library for HPC

Author: Comito Claudia
Götz Markus
Hagemeier Björn
Knechtges Philipp
Krajsek Kai
Siggel Martin
Publication venue
Publication date: 07/01/2019
Field of study

KITopen

Heat - A Distributed and Accelerated Tensor Framework for Data Analytics and Machine Learning

Author: Basermann Achim
Comito Claudia
Coquelin Daniel
Debus Charlotte
Götz Markus
Hagemeier Björn
Knechtges Philipp
Krajsek Kai
Siggel Martin
Streit Achim
Tarnawa Michael
Publication venue
Publication date: 01/12/2021
Field of study

KITopen

HeAT -- a Distributed and GPU-accelerated Tensor Framework for Data Analytics

Author: Basermann Achim
Comito Claudia
Coquelin Daniel
Debus Charlotte
Götz Markus
Hagemeier Björn
Hanselmann Simon
Knechtges Philipp
Krajsek Kai
Siggel Martin
Streit Achim
Tarnawa Michael
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

To cope with the rapid growth in available data, the efficiency of data analysis and machine learning libraries has recently received increased attention. Although great advancements have been made in traditional array-based computations, most are limited by the resources available on a single computation node. Consequently, novel approaches must be made to exploit distributed resources, e.g. distributed memory architectures. To this end, we introduce HeAT, an array-based numerical programming framework for large-scale parallel processing with an easy-to-use NumPy-like API. HeAT utilizes PyTorch as a node-local eager execution engine and distributes the workload on arbitrarily large high-performance computing systems via MPI. It provides both low-level array computations, as well as assorted higher-level algorithms. With HeAT, it is possible for a NumPy user to take full advantage of their available resources, significantly lowering the barrier to distributed data analysis. When compared to similar frameworks, HeAT achieves speedups of up to two orders of magnitude.Comment: 10 pages, 8 figures, 5 listings, 1 tabl

arXiv.org e-Print Archive

Institute of Transport Research:Publications

KITopen

Juelich Shared Electronic Resources

HeAT – a Distributed and GPU-accelerated Tensor Framework for Data Analytics

Author: Basermann Achim
Comito Claudia
Coquelin Daniel
Debus Charlotte
Götz Markus
Hagemeier Björn
Hanselmann Simon
Knechtges Philipp
Krajsek Kai
Siggel Martin
Streit Achim
Tarnawa Michael
Publication venue
Publication date: 10/09/2020
Field of study

In order to cope with the exponential growth in available data, the efficiency of data analysis and machine learning libraries have recently received increased attention. Although corresponding array-based numerical kernels have been significantly improved, most are limited by the resources available on a single computational node. Consequently, kernels must exploit distributed resources, e.g., distributed memory architectures. To this end, we introduce HeAT, an array-based numerical programming framework for large-scale parallel processing with an easy-to-use NumPy-like API. HeAT utilizes PyTorch as a node-local eager execution engine and distributes the workload via MPI on arbitrarily large high-performance computing systems. It provides both low-level array-based computations, as well as assorted higher-level algorithms. With HeAT, it is possible for a NumPy user to take advantage of their available resources, significantly lowering the barrier to distributed data analysis. Compared with applications written in similar frameworks, HeAT achieves speedups of up to two orders of magnitude

KITopen

The Helmholtz Analytics Toolkit (Heat) and its role in the landscape of massively-parallel scientific Python

Author: Comito Claudia
Gutiérrez Hermosillo Muriedas Juan Pedro
Götz Markus
Hagemeier Björn
Hoppe Fabian
Knechtges Philipp
Krajsek Kai
Rüttgers Alexander
Streit Achim
Tarnawa Michael
Publication venue
Publication date: 01/08/2023
Field of study

When it comes to enhancing exploitation of massive data, machine learning methods are at the forefront of researchers’ awareness. Much less so is the need for, and the complexity of, applying these techniques efficiently across large-scale, memory-distributed data volumes. In fact, these aspects typical for the handling of massive data sets pose major challenges to the vast majority of research communities, in particular to those without a background in high-performance computing. Often, the standard approach involves breaking up and analyzing data in smaller chunks; this can be inefficient and prone to errors, and sometimes it might be inappropriate at all because the context of the overall data set can get lost. The Helmholtz Analytics Toolkit (Heat) library offers a solution to this problem by providing memory-distributed and hardware-accelerated array manipulation, data analytics, and machine learning algorithms in Python. The main objective is to make memory-intensive data analysis possible across various fields of research ---in particular for domain scientists being non-experts in traditional high-performance computing who nevertheless need to tackle data analytics problems going beyond the capabilities of a single workstation. The development of this interdisciplinary, general-purpose, and open-source scientific Python library started in 2018 and is based on collaboration of three institutions (German Aerospace Center DLR, Forschungszentrum Jülich FZJ, Karlsruhe Institute of Technology KIT) of the Helmholtz Association. The pillars of its development are... - ...to enable memory distribution of n-dimensional arrays, - to adopt PyTorch as process-local compute engine (hence supporting GPU-acceleration), - to provide memory-distributed (i.e., multi-node, multi-GPU) array operations and algorithms, optimizing asynchronous MPI-communication (based on mpi4py) under the hood, and - to wrap functionalities in NumPy- or scikit-learn-like API to achieve porting of existing applications with minimal changes and to enable the usage by non-experts in HPC. In this talk we will give an illustrative overview on the current features and capabilities of our library. Moreover, we will discuss its role in the existing ecosystem of distributed computing in Python, and we will address technical and operational challenges in further development

Institute of Transport Research:Publications

Building Blocks for Computer Vision with Stochastic Partial Differential Equations

Author: A. Bruhn
A. J. Chorin
A. J. Chorin
B. Jähne
B. K. P. Horn
C. Fermüller
D. A. Forsyth
D. B. Xiu
D. B. Xiu
D. B. Xiu
D. B. Xiu
D. Lucor
F. Catté
F. H. Maltz
H. Haussecker
H. Scharr
Hanno Scharr
J. K. Kearney
J. Weber
J. Weickert
K. Mikula
Kai Krajsek
M. Avriel
M. K. Deb
M. T. Reagan
M. T. Reagan
N. Papenberg
N. Wiener
O. P. Le Maître
P. Malliavin
P. Perona
P. S. Laplace de
R. G. Ghanem
R. G. Ghanem
Robert M. Kirby
S. Huffel Van
S. Kichenassamy
T. Amiaz
T. Iijima
Tobias Preusser
V. A. Narayanan
V. Thomee
W. C. Meecham
Y. Bao
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Signal and noise adapted filters for differential motion estimation

Author: Kai Krajsek
Kai Krajsek
Rudolf Mester
Rudolf Mester
Publication venue
Publication date
Field of study

processin

CiteSeerX

The Helmholtz Analytics Toolkit (HeAT) - A Scientific Big Data Library for HPC -

Author: Krajsek Kai
Publication venue
Publication date: 01/01/2019
Field of study

This talk presents the Helmholtz Analytics Toolkit (HeAT), a HPC data analytics library for scientific applications. HeAT builds on top of PyTorch which provides many required features such as automatic differentiation, CPU and GPU support, linear algebra operations and basic MPI functionalities. However, distributed computations must be designed by hand for each basic communication and furthermore PyTorch implements only a subset of MPI functionalities. HeAT starts at this point providing a distributed tensor data object on which operations can be performed. The tensor data objects reside either on the CPU or on the GPU and, if desired, are distributed over various nodes. Operations on tensor objects are transparent to the user, i.e. they remain the same irrespective of whether the HeAT data object resides on a single node or if it is distributed over several nodes. On the basis of this core structure, HeAT implements typical data analytics methods motivated from various scientific use cases.After motivating the framework and specifying its scope, the talk describes its concept and its realization in detail. The presentation demonstrates the usage of HeAT by means of several typical examples from data analytics. The presentation closes with a discussion on the downsides, further developments and future challenges of HeAT

Juelich Shared Electronic Resources

The Edge Preserving Wiener Filter for Scalar and Tensor Valued Images

Author: Kai Krajsek
Rudolf Mester
Publication venue
Publication date
Field of study

This contribution presents a variation of the Wiener filter criterion, i.e. minimizing the mean squared error, by combining it with the main principle of normalized convolution, i.e. the introduction of prior information in the filter process via the certainty map. Thus, we are able to optimize a filter according to the signal and noise characteristics while preserving edges in images. In spite of its low computational costs the proposed filter scheme outperforms state of the art filter methods working also in the spatial domain. Furthermore, the Wiener filter paradigm is extended from scalar valued data to tensor valued data

CiteSeerX

Comparing model evaluation by different advanced machine learning approaches

Author: Elbern Hendrik
Krajsek Kai
Publication venue
Publication date: 01/01/2018
Field of study

Juelich Shared Electronic Resources